You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Posts with tag Digital Humanities


← Back to all posts
Mar 06 2012

I just saw that various Digital Humanists on Twitter were talking about representativeness, exclusion of women from digital archives, and other Big Questions. I can only echo my general agreement about most of the comments.

Feb 20 2012
  1. The new USIH blogger LD Burnett has a post up expressing ambivalence about the digital humanities because it is too eager to reject books. This is a pretty common argument, I think, familiar to me in less eloquent forms from New York Times comment threads. Its a rhetorically appealing positionto set oneself up as a defender of the book against the philistines who not only refuse to read it themselves, but want to take your books away and destroy them. I worry theres some mystification involvedconflating corporate publishers with digital humanists, lumping together books with codices with monographs, and ignoring the tension between reader and consumer. This problem ties up nicely into the big event in DH in the last weekthe announcement of the first issue of the ambitiously all-digital Journal of Digital Humanities. So let me take a minute away from writing about TV shows to sort out my preliminary thoughts on books.

Jun 16 2011

Let me get back into the blogging swing with a (too longthis is why I cant handle Twitter, folks) reflection on an offhand comment. Dont worry, theres some data stuff in the pipe, maybe including some long-delayed playing with topic models.

Apr 13 2011

All the cool kids are talking about shortcomings in digitized text databases. I dont have anything so detailed to say as what Goose Commerce or Shane Landrum have gone into, but I do have one fun fact. Those guys describe ways that projects miss things we might think are important but that lie just outside the most mainstream intereststhe neglected Early Republic in newspapers, letters to the editor in journals, etc. They raise the important point that digital resources are nowhere near as comprehensive as we sometimes think, which is a big caveat we all need to keep in mind. I want to point out that its not just at the margins were missing texts: omissions are also, maybe surprisingly, lurking right at the heart of the canon. Heres an example.

Apr 03 2011

Shane Landrum (@cliotropic) says my claim that historians have different digital infrastructural needs than other fields might be provocative. I dont mean this as exceptionalism for historians, particularly not compared to other humanities fields. I do think historians are somewhat exceptional in the volume of texts they want to processat Princeton, they often gloat about being the heaviest users of the library. I do think this volume is one important reason English has a more advanced field of digital humanities than history does. But the needs are independent of the volume, and every academic field has distinct needs. Data, though, is often structured for either one set of users, or for a mushy middle.

Mar 02 2011

url: /2011/03/what-historians-dont-know-about.html

Feb 20 2011

I wanted to see how well the vector space model of documents Ive been using for PCA works at classifying individual books. [Note at the outset: this post swings back from the technical stuff about halfway through, if youre sick of the charts.] While at the genre level the separation looks pretty nice, some of my earlier experiments with PCA, as well as some of what I read in the Stanford Literature Labs Pamphlet One, made me suspect individual books would be sloppier. There are a couple different ways to ask this question. One is to just drop the books as individual points on top of the separated genres, so we can see how they fit into the established space. By the first two principal components, for example, we can make all the books  in LCC subclasses BF (psychology) blue, and use red for QE (Geology), overlaying them on a chart of the first two principal components like Ive been using for the last two posts:

Feb 11 2011

Ive spent a lot of the last week trying to convince Princeton undergrads its OK to occasionally disagree with each other, even if theyre not sure theyre right. So let me make one of my notes on one of the places Ive felt a little bit of skepticism as I try to figure whats going on with the digital humanities.

Jan 21 2011

In writing about openness and the ngrams database, I found it hard not to reflect a little bit about the role of copyright in all this. Ive called 1922 the year digital history ends before; for the kind of work I want to see, its nearly an insuperable barrier, and its one I think not enough non-tech-savvy humanists think about. So let me dig in a little.

Jan 20 2011

The Culturomics authors released a FAQ last week that responds to many of the questions floating around about their project. I should, by trade, be most interested in their responses to the lack of humanist involvement. Ill get to that in a bit. But instead, I find myself thinking more about what the requirements of openness are going to be for textual research.

Dec 04 2010

Patricia Cohens new article about the digital humanities doesnt come with the rafts of crotchety comments the first one did, so unlike last time Im not in defensive crouch. To the contrary: Im thrilled and grateful that Dan Cohen, the main subject of the article, took the time in his moment in the sun to link to me. The article itself is really good, not just because the Cohen-Gibbs Victorian project is so exciting, but because P. Cohen gets some thoughtful comments and the NYT graphic designers, as always, do a great job. So I just want to focus on the Google connection for now, and then Ill post my versions of the charts the Times published.

Dec 03 2010

Dan Cohen, the hub of all things digital history, in the news and on his blog.

Dec 02 2010

Jamies been asking for some thoughts on what it takes to do thisstatistics backgrounds, etc. I should say that Im doing this, for the most part, the hard way, because 1) My database is too large to start out using most tools I know of, including I think the R text-mining package, and 2) I want to understand how it works better. I dont think Im going to do the software review thing here, but there are what look like a _lot _of promising leads at an American Studies blog.

Dec 01 2010

Ive had digital humanities in the blogs subtitle for a while, but its a terribly offputting term. I guess its supposed to evoke future frontiers and universal dissemination of humanistic work, but it carries an unfortunate implication that the analog humanities are something completely different. It makes them sound older, richer, more subtleand scheduled for demolition. No wonder a world of online exhibitions and digital texts doesnt appeal to most humanists of the tweed and dust-jacket crowd. I think we need a distinction that better expresses how digital technology expands the humanities, rather than constraining it.

Dec 01 2010

Jamie asked about assignments for students using digital sources. Its a difficult question.

Nov 28 2010

Most intensive text analysis is done on heavily maintained sources. Im using a mess, by contrast, but a much larger one. Partly, Im doing this tendentiouslyI think its important to realize that we can accept all the errors due to poor optical character recognition, occasional duplicate copies of works, and so on, and still get workable materials.

Nov 18 2010

One more note on that Grafton quote, which Ill post below.

Nov 17 2010

Im in Moscow now. I still have a few things to post from my layover, but there will be considerably lower volume through Thanksgiving.

Nov 12 2010

All right, lets put this machine into action. A lot of digital humanities is about visualization, which has its place in teaching, which Jamie asked for more about. Before I do that, though, I want to show some more about how this can be a research tool. Henry asked about the history of the term scientific method. I assume he was asking a chart showing its usage over time, but I already have, with the data in hand, a lot of other interesting displays that we can use. This post is a sort of catalog of what some of the low-hanging fruit in text analysis are.

Nov 10 2010

Obviously, I like charts. But Ive periodically been presenting data as a number of random samples, as well.  Its a technique that can be important for digital humanities analysis. And its one that can draw more on the skills in humanistic training, so might help make this sort of work more appealing. In the sciences, an individual data point often has very little meaning on its ownits just a set of coordinates. Even in the big education datasets I used to work with, the core facts that I was aggregating up from were generally very dullone university awarded three degrees in criminal science in 1984, one faculty member earned $55,000 a year. But with language, theres real meaning embodied in every point, that were far better equipped to understand than the computer. The main point of text processing is to act as a sort of extraordinarily stupid and extraordinarily perseverant research assistant, who can bring patterns to our attention but is terrible at telling which patterns are really important. We cant read everything ourselves, but its good to check up periodicallythats why I do things like see what sort of words are the 300,000th in the language, or what 20 random book titles from the sample are.